11 research outputs found

    Data mining techniques for complex application domains

    Get PDF
    The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains. The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis. The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments. The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively

    MeTA: Characterization of medical treatments at different abstraction levels

    Get PDF
    Physicians and healthcare organizations always collect large amounts of data during patient care. These large and high-dimensional datasets are usually characterized by an inherent sparseness. Hence, the analysis of these datasets to gure out interesting and hidden knowledge is a challenging task. This paper proposes a new data mining framework based on generalized association rules to discover multiple-level correlations among patient data. Specically, correlations among prescribed examinations, drugs, and patient proles are discovered and analyzed at different abstraction levels. The rule extraction process is driven by a taxonomy to generalize examinations and drugs into their corresponding categories. To ease the manual inspection of the result, a worthwhile subset of rules, i.e., the non-redundant generalized rules, is considered. Furthermore, rules are classied according to the involved data features (medical treatments or patient proles) and then explored in a top-down fashion, i.e., from the small subset of high-level rules a drill-down is performed to target more specic rules. The experiments, performed on a real diabetic patient dataset, demonstrate the effectiveness of the proposed approach in discovering interesting rule groups at different abstraction levels

    Detecting Tweet-Based Sentiment Polarity of Plastic Surgery Treatment

    No full text
    Sentiment analysis is a growing research these days. Many companies perform this analysis on public opinions to get a general idea about any product or service. This paper presents a novel approach to get views or comments of Twitter users about plastic surgery treatments. The proposed approach uses machine-learning technique embedded with the naïve Bayesian classifier to assign polarities (i.e. positive, negative or neutral) to the tweets, collected from ?Twitter micro-blogging website?. The accuracy of the obtained results has been validated using precision, recall and F-score measures. It has been observed from 25000 tweets dataset that people tend to have positive as well as substantial negative opinions regarding particular treatments. The experimental results show the effectiveness of the proposed approac

    A Digital Diary: Remembering the Past Using the Present Context

    No full text
    Lifelog devices have gained much attention in recent past. These devices are capable of recording daily activities of a user such as visited places, calories burnt, heart rate, etc. However, reminiscing the past life from this huge collection has been less of a concern. We aim to assist in remembering similar past events based on the present context or situation. We designed a prototype lifelog device that captures lifelogs in the form of pictures and audio, and associates them with the device wearer?s context such as people and objects in the vicinity. The information from the past life may be used to helpusers in their current situation. In this article, we attempt to determine the type of context that is preferred by the device users to remember their past events. We found that the users are more interested to find lifelogs based on the people they meet at specific location

    Accessing Data Transfer Reliability for Duty Cycled Mobile Wireless Sensor Network

    No full text
    Mobility in WSNs (Wireless Sensor Networks) introduces significant challenges which do not arise in static WSNs. Reliable data transport is an important aspect of attaining consistency and QoS (Quality of Service) in several applications of MWSNs (Mobile Wireless Sensor Networks). It is important to understand how each of the wireless sensor networking characteristics such as duty cycling, collisions, contention and mobility affects the reliability of data transfer. If reliability is not managed well, the MWSN can suffer from overheads which reduce its applicability in the real world. In this paper, reliability assessment is being studied by deploying MWSN in different indoor and outdoor scenarios with various duty cycles of the motes and speeds of the mobile mote. Results show that the reliability is greatly affected by the duty cycled motes and the mobility using inherent broadcast mechanisms

    A filter-based feature selection approach in multilabel classification

    No full text
    Multi-label classification is a fast-growing field of machine learning. Recent developments have shown several applications, including social media, healthcare, bio-molecular analysis, scene, and music classification associated with the multilabel classification. In classification problems, multiple labels (multilabel or more than one class label) are assigned to an unseen record instead of a single-label class assignment. Feature selection is a preprocessing phase used to identify the most relevant features that could improve the accuracy of the multilabel classifiers. The focus of this study is the feature selection method in multilabel classification. The study used a feature selection filter method involving the Fisher score, analysis of variance test, mutual information, Chi-Square, and ensembles of these statistical methods. An extensive range of machine learning algorithms is applied in the modelling phase of a multilabel classification model that includes binary relevance, classifier chain, label powerset, binary relevance KNN, multi-label twin support vector machine, multi-label KNN. Besides, label space partitioning and majority voting of ensemble methods are used and Random Forest is the base learner. The experiments are carried out over five different multilabel benchmarking datasets. The evaluation of the classification model is measured using accuracy, precision, recall, F1 score, and hamming loss. The study demonstrated that the filter methods (i.e. mutual information) having top weighted 80%80\% to 20%20\% features provided significant outcomes

    An Exploratory Study of Software Sustainability at Early Stages of Software Development

    No full text
    Sustainability incorporation within the field of Software Engineering is an emerging research area. Sustainability, from an academic perspective, has been addressed to a large extent. However, when it comes to the software industry, the topic has not received much-needed attention. Software, being designed and developed in the industry, can benefit society at large, if sustainability is taken into account by the software professionals during the software design and development process. To develop a sustainable software application, knowledge and awareness about sustainability by professional software developers is one of the key elements. This study is an attempt to examine sustainability knowledge, importance, and support from the perspective of South Asian software professionals. Additionally, this study also proposes sustainability guidelines for certain software applications and also a catalog for the identification of sustainability requirements for different software applications. The queries such as ‘What does sustainability mean to a professional software developer?’, ‘How does the software industry identify sustainability requirements?’, ‘How do software developers incorporate the sustainability parameters within software during software development?’, and many other such queries are addressed in this study. To achieve this goal, a survey was carried out among 221 industry practitioners involved in software projects in various application domains such as banking, finance, and management applications. The results pinpoint that even though sustainability is deemed important by 91% of practitioners, still there is a lack of understanding regarding sustainability incorporation in software development. A total of 48% of professionals often misunderstand “Green software” as “sustainable software”. The technical aspect of sustainability is considered most important by professionals (67%) as well as companies (77%). One of the key findings of this study is that 92% of software practitioners are not able to identify sustainability requirements for software applications. The outcomes of the study may be regarded as an initial attempt towards how sustainability is comprehended in software by the South Asian software industry

    Extraction of medical pathways from electronic patient records

    No full text
    With the introduction of electronic medical records, a large amount of patients' medical data has been available. An actual problem in this domain is to perform reverse engineering of the medical treatment process to highlight medical pathways typically adopted for specific health conditions. This chapter addresses the ability of sequential data mining techniques to reconstruct the actual medical pathways followed by patients. Detected medical pathways are in the form of sets of exams frequently done together, sequences of exam sets frequently followed by patients and frequent correlations between exam sets. The analysis shows that the majority of the extracted pathways are consistent with the medical guidelines, but also reveals some unexpected results, which can be useful both to enrich existing guidelines and to improve the public sanitary servic
    corecore